Journal article
Large-scale pattern search using reduced-space on-disk suffix arrays
S Gog, A Moffat, JS Culpepper, A Turpin, A Wirth
IEEE Transactions on Knowledge and Data Engineering | Published : 2014
Abstract
The suffix array is an efficient data structure for in-memory pattern search. Suffix arrays can also be used for external-memory pattern search, via two-level structures that use an internal index to identify the correct block of suffix pointers. In this paper, we describe a new two-level suffix array-based index structure that requires significantly less disk space than previous approaches. Key to the saving is the use of disk blocks that are based on prefixes rather than the more usual uniform-sampling approach, allowing reductions between blocks and subparts of other blocks. We also describe a new in-memory structure-the condensed BWT-and show that it allows common patterns to be resolved..
View full abstractGrants
Awarded by Victorian Life Sciences Computation Initiative (VLSCI)
Funding Acknowledgements
This work was supported by the Australian Research Council. The ROSA software is at https://github.com/simongog/RoSA. This research was supported by a Victorian Life Sciences Computation Initiative (VLSCI) grant number VR0052 on its Peak Computing Facility at The University of Melbourne, an initiative of the Victorian Government, Australia.